Classification of Research Papers into a Patent Classification System Using Two Translation Models

نویسندگان

  • Hidetsugu Nanba
  • Toshiyuki Takezawa
چکیده

Classifying research papers into patent classification systems enables an exhaustive and effective invalidity search, prior art search, and technical trend analysis. However, it is very costly to classify research papers manually. Therefore, we have studied automatic classification of research papers into a patent classification system. To classify research papers into patent classification systems, the differences in terms used in research papers and patents should be taken into account. This is because the terms used in patents are often more abstract or creative than those used in research papers in order to widen the scope of the claims. It is also necessary to do exhaustive searches and analyses that focus on classification of research papers written in various languages. To solve these problems, we propose some classification methods using two machine translation models. When translating English research papers into Japanese, the performance of a translation model for patents is inferior to that for research papers due to the differences in terms used in research papers and patents. However, the model for patents is thought to be useful for our task because translation results by patent translation models tend to contain more patent terms than those for research papers. To confirm the effectiveness of our methods, we conducted some experiments using the data of the Patent Mining Task in the NTCIR-7 Workshop. From the experimental results, we found that our method using translation models for both research papers and patents was more effective than using a single translation model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Query Translation and Text Classification in a Cross-Language Patent Access System

In this paper, a cross-language patent retrieval and classification system is presented to integrate the query translation using various free web translators on the internet and the document classification. The language-independent indexing method was used to process the multilingual patent documents, and the query translation method was used to translate the query from the source language to t...

متن کامل

Multi-label Classification using Logistic Regression Models for NTCIR-7 Patent Mining Task

We design a multi-label classification system based on a machine learning approach for the NTCIR-7 Patent Mining Task. In our system, we employ a logistic regression model for each International Patent Classification (IPC) code that determines the IPC code assignment of research papers. The logistic regression models are trained by using patent documents provided by task organizers. To mitigate...

متن کامل

An Automated Research Paper Classification Method for the IPC system with the Concept Base

In the present paper, a classification method using the Concept Base is proposed and evaluated in the Patent Mining Task of the NTCIR-7 workshop. In this task, research papers are classified into the International Patent Classification (IPC) system. The classification enables research papers to be located on a patent map. In order to classify a paper, patent documents that are similar to the pa...

متن کامل

The Patent Mining Task in the Seventh NTCIR Workshop

This paper introduces the Patent Mining Task in the Seventh NTCIR Workshop, which is currently in progress, and the test collections produced in this task. Its goal is the classification of research papers written either in Japanese or in English into the International Patent Classification (IPC) system, which is a global standard patent classification system.

متن کامل

Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques

Retrieving research papers and patents is important for any researcher assessing the scope of a field with high industrial relevance. However, the terms used in patents are often more abstract or creative than those used in research papers, because they are intended to widen the scope of claims. Therefore, a method is required for translating scholarly terms into patent terms. In this paper, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009